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Abstract 

A  new  region-based  approach  to  nonrigid  motion  tracking 
is  described.  Shape  is  defined  in  terms  of  a  deformable 
triangular  mesh  that  captures  object  shape  plus  a  color 
texture  map  that  captures  object  appearance.  Photometric 
variations  are  also  modeled.  Nonrigid  shape  registration 
and  motion  tracking  are  achieved  by  posing  the  problem  as 
an  energy-based,  robust  minimization  procedure.  The  ap¬ 
proach  provides  robustness  to  occlusions,  wrinkles,  shad¬ 
ows,  and  specular  highlights.  The  formulation  is  tailored 
to  take  advantage  of  texture  mapping  hardware  available 
in  many  workstations,  PC's,  and  game  consoles.  This  en¬ 
ables  nonrigid  tracking  at  speeds  approaching  video  rate. 

1  Introduction 

A  key  open  problem  in  tracking  is  that  of  encoding  and 
comparing  shapes  as  they  undergo  nonrigid  deformation. 
Simply  providing  robustness  to  nonrigid  deformation  is 
insufficient,  because  deformation  often  provides  impor¬ 
tant  information  about  how  shapes  are  related.  To  make 
things  worse,  tracking  must  also  cope  with  possible  light¬ 
ing  changes,  specular  highlights,  shadows,  and  occlusions. 

In  images,  the  motion  of  objects  is  sometimes  due  to 
changes  in  viewing  geometry:  e.g.,  projective  effects,  or 
change  in  object  pose.  In  many  such  cases,  a  simple  affine 
model  or  eight  parameter  projective  deformation  model  is 
sufficient  to  encode  the  resulting  image  motions.  However, 
in  general,  these  parameterizations  are  inadequate  for  rep¬ 
resenting  motions  that  arise  due  to  a  physical  deformation. 
For  instance,  most  biological  objects  are  flexible  and  artic¬ 
ulated:  fingers  bend,  cheeks  bulge,  fish  swim,  trees  sway  in 
the  breeze,  etc.  Shapes  are  stretched,  bent,  tapered,  dented, 
etc.,  and  so  it  seems  logical  to  employ  a  model  that  can 
encode  the  ways  in  which  real  objects  deform. 

1.1  Related  Work 

This  rationale  led  to  the  physics-inspired  modeling 
paradigm  of  active  contours  or  snakes  [18].  A  snake  has 
a  predefined  structure  that  incorporates  prior  knowledge 
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about  a  contour's  smoothness  and  its  resistance  to  defor¬ 
mation. 

While  snakes  enforced  constraints  on  smoothness  and 
the  amount  of  deformation,  they  could  not  in  their  original 
form  be  used  to  constrain  the  types  of  deformation  valid  for 
a  particular  problem  domain  or  object  class.  Furthermore, 
it  was  difficult  to  use  snakes  for  recognition  because  of  dif¬ 
ferences  in  sampling  and  parameterization  in  comparing 
recovered  descriptions. 

This  led  to  the  development  of  algorithms  that  enforce 
a  priori  constraints  on  the  types  of  allowable  deforma¬ 
tions  for  motion  tracking  [5,  10],  deformable  templates 
[16,  34,  36],  trainable  snakes  [2,  9],  and  deformable  pro¬ 
totypes  [27].  Such  approaches  provide  a  low-dimensional 
characterization  of  shape  that  enables  recognition  and 
comparison  of  nonrigid  motions. 

Another  promising  family  of  approaches  is  based  on 
correlation  of  deforming  image  patches  [3,  15,  21,  32]. 
These  approaches  integrate  information  over  an  image  re¬ 
gion,  and  therefore  tend  to  be  more  immune  to  noise  and/or 
low-contrast,  especially  if  a  robust  estimator  formulation  is 
employed  [4].  To  date,  most  correlation-based  models  for 
nonrigid  motion  require  off-line  processing,  though  multi- 
scale  techniques  offer  some  hope  for  realtime  performance 
[15,  32].  Real-time  approaches  for  tracking  of  parameter¬ 
ized  patches  have  been  developed  [12,  13];  however,  these 
methods  do  not  address  general  nonrigid  motion  tracking. 

1.2  New  Approach:  Active  Blobs 

To  address  the  general  nonrigid  tracking  problem,  we  will 
introduce  a  new  deformable  model  formulation:  active 
blobs.  Active  blobs  employ  a  texture-mapped  triangular 
mesh  model  for  tracking  deforming  shapes  in  color  images. 

The  goal  is  to  robustly  track  nonrigidly  deforming 
shapes  at  speeds  approaching  video  frame-rate  on  a  stan¬ 
dard  graphics  workstation.  To  gain  better  robustness,  ac¬ 
tive  blobs  incorporate  information  about  not  only  shape, 
but  also  color  image  appearance.  Active  blobs  also  pro¬ 
vide  some  robustness  to  photometric  variations,  includ¬ 
ing  specular  highlights  and  shadows.  By  taking  advantage 
of  texture  mapping  hardware  available  in  many  worksta¬ 
tions,  active  blobs  have  achieved  peak  rates  of  over  twelve 
frames/sec.  on  an  SGI  Indigo2  Impact  RIOK. 
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Figure  1:  Model  construction  using  a  color  image.  From 
left  to  right:  a.)  input  image  with  region  of  interest  over- 
laid,  b.)  resulting  triangle  mesh,  c.)  texture  mapped  model. 


2  The  Basic  Idea 

The  construction  of  an  example  active  blob  model  is  shown 
in  Fig.  1.  The  input  to  blob  construction  is  an  example  im¬ 
age  plus  segmentation  information  —  provided  as  a  binary 
support  region  or  as  a  contour  that  encloses  the  shape.  The 
input  can  also  include  interior  feature  points  to  be  used  as 
nodes  in  the  triangular  mesh.  In  this  example,  the  user  cir¬ 
cled  the  object  of  interest. 

A  2D  active  blob  model  is  then  constructed  using  a 
modified  Delaunay  triangular  meshing  algorithm.  To  de¬ 
form  the  model,  we  deform  this  mesh.  Nonrigid  defor¬ 
mation  of  the  mesh  can  be  specified  in  terms  of  paramet¬ 
ric  functions;  e.g.,  affine  deformations,  eight  parameter 
projective  deformations,  application-specific  deformations 
[3],  finite  element  modal  deformations  [22,  28],  or  princi¬ 
pal  deformations  derived  from  a  statistical  analysis  over  a 
training  set  of  shapes  [9,  20,  21]. 

The  blob’s  appearance  is  then  captured  as  a  color  tex¬ 
ture  map  and  applied  directly  to  the  triangulated  model.  A 
blob  warp  is  defined  as  a  deformation  of  the  mesh  and  then 
a  bilinear  resampling  of  the  texture  mapped  triangles.  By 
defining  image  warping  in  this  way,  it  is  possible  to  har¬ 
ness  hardware  accelerated  triangle  texture  mapping  capa¬ 
bilities  becoming  prevalent  in  mid-end  workstations,  PC'  s, 
and  computer  game  consoles  (e.g.,  Nintendo  64). 

Tracking  is  then  posed  as  the  problem  of  active  blob 
registration.  The  registration  procedure  minimizes  a  func¬ 
tion  that  accounts  for  both  the  priors  on  shape  (the  defor¬ 
mation  parameters)  and  the  priors  on  appearance  (the  color 
texture  map).  Through  the  use  of  a  robust  error  norm,  reg¬ 
istration  can  be  made  robust  to  specular  highlights,  shad¬ 
ows,  some  photometric  variations,  and  small  occlusions. 
Furthermore,  the  use  of  color  imagery  enables  tracking  in 
situations  where  grayscale  tracking  might  be  less  robust. 

An  example  of  nonrigid  tracking  with  an  active  blob  is 
shown  in  Fig.  2.  The  user  defined  a  rectangular  region  of 
interest.  A  finite  element  modal  parameterization  was  then 
employed  for  tracking.  As  can  be  seen,  the  blob  model 
tracks  the  bag  of  candy  quite  well,  despite  nonrigid  defor¬ 
mation,  wrinkles,  shadows,  and  specular  highlights. 


Figure  2:  Nonrigid  tracking  with  an  active  blob.  This  figure 
shows  every  fifteenth  frame  in  a  tracking  sequence.  For  vi¬ 
sualization  purposes,  an  outline  of  the  active  blob  is  shown 
overlaid  on  the  input  images  in  the  top  row.  The  resulting 
active  blob  tracking  is  shown  below  each  input  image. 

3  Active  Blob  Formulation 

In  the  active  blobs  formulation,  nonrigid  deformation  is 
controlled  by  parametric  functions.  These  functions  are 
applied  to  the  node  points  that  define  the  active  blob's  2D 
triangle  mesh.  Image  warping  and  interpolation  are  ac¬ 
complished  by  displacing  the  mesh  vertices  and  then  re¬ 
sampling  using  bilinear  interpolation.  Thus  we  define  a 
warping  function  for  an  input  image,  I: 

(1) 

where  u  is  a  vector  containing  warping  parameters,  and  V 
is  the  resulting  warped  image. 

The  warping  parameters  control  functions  that  deform 
the  triangle  mesh  via  displacement  at  the  mesh  node  points: 

X'  =  /(X,u),  (2) 

where  the  vector  X  contains  the  triangle  node  point  loca¬ 
tions  Xi,  and  X'  contains  the  resulting  displaced  nodes. 

Perhaps  the  simplest  warping  functions  to  be  used  in 
Eq.  2  are  those  of  a  2D  affine  model  or  an  eight  parameter 
projective  model  [15,  32].  Unfortunately,  these  functions 
are  only  suitable  for  approximating  the  rigid  motion  of  a 
planar  patch.  The  functions  can  be  extended  to  include  lin¬ 
ear  and  quadratic  polynomials  [3];  however,  the  extended 
formulation  cannot  model  general  nonrigid  motion. 

3.1  Finite  Element  Modes 

A  more  general  parameterization  of  nonrigid  motion  can 
be  obtained  via  the  use  of  the  modal  representation  [22]. 
In  the  modal  representation,  deformation  is  represented  in 
terms  of  eigenvectors  of  a  finite  element  (FE)  model.  The 
underlying  FE  formulation  offers  the  added  advantage  that 
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it  can  be  used  in  obtaining  a  regularized  solution  to  the 
nonrigid  tracking  problem,  since  it  can  enforce  a  priori 
contraints  on  smoothness  and  the  amounts  of  deformation. 

Taken  together,  modes  form  an  orthogonal  basis  set  for 
describing  nonrigid  shape  deformation  [22,  28].  Blob  de¬ 
formation  can  be  expressed  as  the  linear  combination  of 
orthogonal  modal  displacements: 

m 

=  X  +  (3) 

j=l 

where  Uj  is  the  mode’ s  parameter  value,  and  the  eigen¬ 
vector  defines  the  displacement  function  for  the 
modal  deformation. 

The  modes  are  ordered  by  increasing  eigenvalue,  Uj, 
These  eigenvalues  correspond  with  each  mode's  frequency 
of  vibration.  For  a  2-D  problem,  the  first  three  modes  are 
translation  and  linearized  rotation.  The  next  lowest-order 
modes  correspond  qualitatively  with  human's  notions  of 
nonrigid  deformation:  scaling,  stretching,  skewing,  bend¬ 
ing,  etc.  The  rest  are  higher-order  nonrigid  modes.  Such 
an  ordering  of  shape  deformation  allows  us  to  select  which 
types  of  deformations  are  to  be  observed.  For  instance, 
it  may  be  desirable  to  make  tracking  rotation,  position, 
and/or  scale  independent.  To  do  this,  we  ignore  displace¬ 
ments  in  the  appropriate  low-order  modes. 

The  modal  decoupling  also  allows  efficient  and  robust 
solution  to  the  alignment  problem.  By  discarding  high  fre¬ 
quency  modes  the  amount  of  computation  required  can  be 
reduced  without  significantly  altering  tracking  accuracy. 
Discarding  the  highest-frequency  modes  can  also  make 
tracking  more  robust  to  noise  [22]. 

For  a  given  modal  parameter  vector  obtained  in  track¬ 
ing,  we  can  compute  the  strain  energy  associated  with 
modal  deformation: 

m 

Emodal  ~ 

Each  eigenvalue  ujj  defines  the  stiffness  associated  with 
changes  in  a  particular  mode  parameter.  As  will  be  ex¬ 
plained  in  the  next  section,  strain  energy  can  be  used  to 
enforce  the  prior  probabilities  on  the  shape's  deformations. 

3.2  Statistical  Modes 

As  has  been  pointed  out  by  Terzopoulos  [33],  and  by 
others  [7,  8,  20],  there  is  a  well  understood  link  be¬ 
tween  physically-motivated  deformable  models  and  statis¬ 
tical  estimation.  Splines  were  perhaps  some  of  the  first 
“physically-based”  models  employed  in  statistical  estima¬ 
tion  [19];  they  are  particularly  well-suited  to  modeling  data 
sampled  from  a  Markov  Random  Field  (MRF),  with  Gaus¬ 
sian  noise  added  [11].  The  same  principles  hold  true  for 


regularization  [6,  33],  where  the  energies  of  a  physical 
model  can  be  related  directly  with  measurement  and  prior 
probabilities  used  in  Bayesian  estimation  [30]. 

Rather  than  modeling  the  system  as  an  elastic  material, 
we  can  instead  assume  nothing  about  it,  collect  data  sam¬ 
ples  of  the  displacements  of  each  node,  and  then  perform 
a  principal  components  analysis  [9,  20,  21].  The  princi¬ 
pal  directions  are  defined  in  terms  of  the  eigenvectors  and 
eigenvalues  of  the  displacement  covariance  matrix.  Each 
eigenvector  defines  a  principal  deformation,  and  can  be 
used  directly  in  Eq.  3.  The  stiffness  associated  with  each 
mode  is  inversely  proportional  to  its  corresponding  eigen¬ 
value,  and  can  be  used  directly  in  Eq.  4. 

This  connection  leads  to  two  useful  observations.  First, 
using  a  FE  model  is  equivalent  to  making  assumptions 
about  the  distribution  of  samples  we  expect  to  see.  Not  us¬ 
ing  any  model,  just  collecting  data  and  using  statistics,  on 
the  other  hand,  implies  no  a  priori  knowledge  of  this  dis¬ 
tribution  and  instead  represents  an  attempt  to  statistically 
approximate  it  through  experimental  observation. 

3.3  Photometric  Variation 

We  would  also  like  to  derive  a  parameterization  for  model¬ 
ing  brightness  and  contrast  variations.  It  is  possible  to  ac¬ 
count  for  photometric  variation  by  extending  our  previous 
warping  function  to  include  brightness  and  contrast  terms: 

r  =  cW{l,n)  +  b,  (5) 

b{x,y,a)  =  aox -h  aixy  +  a2y as,  (6) 

c{x,  y,  a)  =  a^x  -1-  a^xy  +  a^y  H-  ay,  (7) 

where  a  is  a  vector  of  coefficients  for  bilinear  functions 
that  vary  with  image  coordinates  {x,  y). 

In  our  current  system,  the  photometric  correction  terms 
are  defined  to  scale  the  red,  green,  and  blue  channels 
equally.  Photometric  correction  is  accomplished  via  image 
blending  capabilities  provided  by  the  graphics  workstation. 

3.4  Combined  Parameterization 

Deformation  and  photometric  parameters  can  be  combined 
in  generic  parameter  vector  a.  The  generic  form  allows  us 
to  utilize  active  blobs  with  any  combination  of  the  above 
parameterizations.  The  image  warping  function  becomes: 

I'  =  W(I,a),  (8) 

and  the  deformation  energy  term  becomes: 

m 

E deformation  ~  ^  ^ 

j=l 

where  are  the  stiffnesses  associated  with  each  param¬ 
eter.  If  no  stiffness  value  is  available  for  a  particular  pa¬ 
rameter,  then  it  is  set  to  zero.  The  stiffnesses  can  also  be 
determined  via  statistical  estimation  [8,  20]. 


3 


4  Active  Blob  Registration 

The  goal  of  our  system  is  nonrigid  shape  tracking.  To 
achieve  this,  the  system  recovers  warping  parameters  that 
register  a  template  image  Iq  with  a  stream  of  incoming 
video  images.  The  maximum  likelihood  solution  to  this 
two  image  registration  problem  consists  of  minimizing  the 
squared  error  for  all  the  pixels  within  the  blob: 

1  ^ 

Eimage  =  -  ef  (10) 

i=l 

e,  =  ||I'(a:i,yi)  -  I(a:i,2/i)||,  (11) 

where  I'(xi,  j/i)  is  a  pixel  in  the  warped  template  image  as 
prescribed  in  Eq.  8,  and  I(xi,  yi)  is  the  pixel  at  the  same 
location  in  the  input.  The  above  equation  is  formulated  for 
comparing  two  color  images;  thus,  it  incoq)orates  the  sum 
of  squared  difference  over  all  channels  at  each  pixel 
Traditional  image  registration  can  be  easily  corrupted 
by  outliers.  The  process  can  be  made  less  sensitive  to  out¬ 
liers  if  we  replace  the  quadratic  error  norm  with  an  influ¬ 
ence  function  [14]: 

1 

^image  “  ^  ^  t?*),  (12) 

^  •  1 
t=l 

where  a  is  an  optional  scale  parameter,  and  p  is  the  influ¬ 
ence  function.  Such  functions  are  also  known  as  robust 
error  norms  [4];  they  can  be  used  to  control  the  bias  a  par¬ 
ticular  measurement  has  on  the  registration  solution. 

If  it  is  assumed  that  noise  is  Gaussian  distributed, 
then  the  optimal  error  norm  is  simply  the  quadratic  norm 
p(ei,cr)  =  ef/2(7^.  However,  robustness  to  outliers  can 
be  further  improved  via  the  use  of  a  Lorentzian  influence 
function  [4]: 

/>(ei,cr)=log(l  +  ^)  (13) 

This  norm  replaces  the  traditional  quadratic  norm  found 
in  least  squares.  Using  the  Lorentzian  is  equivalent  to  the 
incorporation  of  an  analog  outlier  process  in  our  objective 
function  [4].  The  formulation  results  in  better  robustness 
to  specular  highlights  and  occlusions.  For  efficiency,  the 
log  function  can  be  implemented  via  table  look-up. 

Equation  1 2  includes  a  data  term  only;  thus  it  only  en¬ 
forces  the  recovered  model's  fidelity  to  the  image  measure¬ 
ments.  The  formulation  can  be  extended  to  include  a  regu¬ 
larizing  term  that  enforces  the  priors  on  the  model  param¬ 
eters  a: 

E  =  -^/5(ei,CT) +  (14) 

"  i=i  j=i 

where  7  is  a  constant  that  controls  the  relative  importance 
of  the  regularization  term,  and  the  ^  are  the  ‘"stiffnesses” 
associated  with  each  model  term  as  described  in  Sec.  3.4. 


41  Marquardt-Levenberg 

Registration  requires  minimization  of  the  residual  error 
with  respect  to  the  deformation  and  lighting  parame¬ 
ters.  A  common  approach  to  multi-dimensional  minimiza¬ 
tion  problems  is  the  Marquardt-Levenberg  method.  This 
method  requires  the  first  and  second  partial  derivatives  of 
E  with  respect  to  the  unknown  model  parameters  a.  For 
the  Lorentzian  error  norm,  the  first  partials  take  the  form: 


dE 

dak 


2  A  dV 

n  “  2(7^  4-  ef  dak 


+270*^1, 


(15) 


The  second  partials  take  the  form: 
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(16) 


The  partial  ^  with  respect  to  a  particular  model  pa¬ 
rameter  ajfe  can  be  approximated  by  adding  small  S  to  that 
parameter,  warping  the  model,  and  then  measuring  the  re¬ 
sulting  change  in  the  residual  error.  All  gradient  calcula¬ 
tions  can  be  made  more  efficient  via  the  use  of  the  available 
graphics  hardware  texture  mapping  capability. 

Following  [23],  the  partials  are  then  used  to  approxi¬ 
mate  Hessian  matrix  H  and  a  weighted  gradient  vector  g. 
It  is  conventional  to  drop  the  image  second  derivative  term 
and  remove  the  factors  of  two,  thereby  defining: 
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+  7^*0*.  (17) 
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The  deformation  and  photometric  correction  parameters 
are  then  iteratively  updated  by  solving  the  linear  system: 


(18) 


(H  +  AI)Aa  =  g,  (19) 


where  A  is  a  stabilization  parameter.  The  stabilization  pa¬ 
rameter  is  initially  set  to  a  large  value.  At  each  iteration, 
the  A  is  increased/decreased  by  a  scale  factor  (typically  an 
order  of  magnitude)  depending  on  whether  the  error  resid¬ 
ual  has  increased  or  decreased. 

This  minimization  procedure  is  iterated  until  the  per¬ 
centage  change  in  the  error  residual  is  below  a  threshold, 
or  the  number  of  iterations  exceeds  some  maximum.  At 
each  iteration,  the  partial  derivatives,  approximate  Hessian 
and  gradient  vector  are  recomputed. 
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4.2  Difference  Decomposition 

Marquardt-Levenberg  requires  the  calculation  of  0{N) 
gradient  images  and  0{N^)  image  products  per  iteration 
of  minimization,  where  N  is  the  number  of  model  param¬ 
eters.  Despite  the  use  of  graphics  hardware  in  acceler¬ 
ating  this  calculation,  this  is  still  the  performance  bottle¬ 
neck,  and  therefore  an  algorithm  that  decreases  the  num¬ 
ber  of  gradient  calculations  is  needed.  As  an  alternative, 
we  can  use  a  difference  decomposition  approach[12].  The 
approach  offers  the  benefit  that  it  requires  the  equivalent 
0(1)  image  gradient  calculations  and  0{N)  image  prod¬ 
ucts  per  iteration. 

In  the  difference  decomposition,  we  define  a  basis  of 
difference  images  generated  by  adding  small  changes  to 
each  of  the  blob  parameters.  Each  difference  image  takes 
the  form: 

bfc=Io-W(Io,n^),  (20) 

where  lo  is  the  template  image,  and  is  the  parameter 
displacement  vector  for  the  basis  image,  b*.  Each  re¬ 
sultant  difference  image  becomes  a  column  in  a  difference 
decomposition  basis  matrix  B.  This  basis  matrix  can  be 
determined  as  a  precomputation. 

During  tracking,  an  incoming  image  I  is  inverse  warped 
into  the  blob's  coordinate  system  using  the  most  recent 
estimate  of  the  warping  parameters  a.  We  then  compute 
the  difference  between  the  inverse- warped  image  and  tem¬ 
plate: 

D  =  Io->V-^(I,a).  (21) 

This  difference  image  D  can  then  be  approximated  in 
terms  of  a  linear  combination  of  the  difference  decomposi¬ 
tion' s  basis  vectors: 

D  «  B^q,  (22) 

where  q  is  a  vector  of  basis  coefficients. 

Thus,  the  maximum  likelihood  estimate  of  q  can  be  ob¬ 
tained  via  least  squares: 

q=  (B^B)-1b’’D.  (23) 

The  change  in  the  image  warping  parameters  is  obtained 
via  matrix  multiplication: 

Aa  =  Nq,  (24) 

where  N  has  columns  formed  by  the  parameter  displace¬ 
ment  vectors  n*  used  in  generating  the  difference  basis. 
The  basis  and  inverse  matrices  can  be  precomputed;  this  is 
the  key  to  the  difference  decomposition's  speed. 


The  difference  decomposition  can  be  extended  to  incor¬ 
porate  the  robust  error  norm  of  Eq.  13: 

^ki^uVi)  =  s\^{hk{xi,yi))^Jp{hk{xuyi),a), 

(25) 

where  hk{xi,yi)  is  the  basis  value  at  the  pixel.  The 
difference  template  D  is  computed  using  the  same  formula. 

Furthermore,  the  formulation  can  be  extended  to  in¬ 
clude  a  regularizing  term  that  enforces  the  priors  on  the 
model  parameters.  This  is  accomplished  using  a  con¬ 
strained  least  squares  formulation.  The  energy  term  to  be 
minimized  takes  the  form: 

E  =  [Bq  “  D]^  [Bq  -  D]  +  7q^Pq,  (26) 
where  P  = 

Differentiating  with  respect  to  q  and  then  rearranging 
terms,  we  obtain  the  constrained  least  squares  solution: 

q=  [B^B  +  7P]’^B^D.  (27) 

If  needed,  this  minimization  procedure  can  be  iterated  at 
each  frame  until  the  percentage  change  in  the  error  residual 
is  below  a  threshold,  or  the  number  of  iterations  exceeds 
some  maximum. 

5  Implementation 

Active  blobs  have  been  implemented  using  both  the  affine 
parameterization  and  the  more  general,  FE  modal  param¬ 
eterization.  For  tracking,  both  the  Marquardt-Levenberg 
and  difference  decomposition  minimization  algorithms 
were  implemented  and  tested.  Details  of  this  implemen¬ 
tation  will  now  be  described. 

Blob  construction  starts  with  the  determination  of  a  sup¬ 
port  region  for  the  object  of  interest.  The  bounding  con- 
tour(s)  for  a  support  region  can  be  extracted  via  a  standard 
4-connected  contour  following  algorithm  [24].  Alterna¬ 
tively,  the  user  can  define  a  bounding  contour  for  a  region 
via  a  sketch  interface.  In  general,  the  number  of  contour 
segments  must  be  reduced.  We  utilize  the  tolerance  band 
approach,  where  the  merging  stage  can  be  iteratively  alter¬ 
nated  with  recursive  subdivision  [17].  In  practice,  a  single 
merging  pass  is  sufficient  if  for  a  user-sketched  boundary. 

The  triangles  are  then  generated  using  an  adaptation  of 
Ruppert’s  Delaunay  refinement  algorithm  [25,  29],  which 
can  produce  consistent  meshes  for  two-dimensional  polyg¬ 
onal  boundaries  that  can  be  concave  and  can  include  holes. 
Interior  node  points  may  also  be  specified;  this  allows  for 
the  inclusion  of  interior  features  in  the  active  blob  model. 
The  algorithm  accepts  two  parameters  that  control  angle 
and  triangle  size  constraints.  To  satisfy  these  constraints, 
additional  interior  vertices  may  be  added  to  the  original 
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polygon  during  mesh  generation.  The  source  code  is  avail¬ 
able  from  http://www,netlib.org/voronoi/. 

Once  a  triangle  mesh  has  been  generated,  a  RGB  color 
texture  map  is  extracted  from  the  example  image.  Each  tri¬ 
angle  mesh  vertex  is  given  an  index  into  the  texture  map 
that  corresponds  to  its  pixel  coordinate  in  the  undeformed 
example  image  Iq.  To  improve  convergence  and  noise  im¬ 
munity  in  tracking,  the  texture  map  is  blurred  using  a  Gaus¬ 
sian  filter.  Texture  map  interpolation  and  rendering  were 
accomplished  using  OpenGL. 

Given  a  triangle  mesh,  the  FE  model  can  be  initial¬ 
ized  using  Gaussian  interpolants  with  finite  support.  Due 
to  space  limitations,  readers  are  directed  to  [26]  for  the 
mathematical  formulation  and  pseudocode.  The  general¬ 
ized  eigenvectors  and  eigenvalues  are  computed  using  code 
from  the  EISPACK  library:  http://www.netlib.org/eispack/. 

If  tracking  is  to  be  accomplished  via  the  difference  de¬ 
composition,  then  the  needed  basis  images  are  precom¬ 
puted.  In  practice,  four  basis  vectors  per  model  parame¬ 
ter  are  sufficient.  For  each  parameter  a^,  these  four  basis 
images  correspond  with  the  difference  patterns  that  result 
by  tweaking  that  parameter  by  ±Si  and  ±26i.  The  factor 
2Si  corresponds  to  the  maximum  anticipated  change  in  that 
parameter  per  video  frame. 

6  Examples 

The  aforementioned  system  was  implemented  on  a  SGI  In- 
digo2  Impact  with  an  RlOK  processor  running  at  195  MHz 
with  192  MB  of  RAM.  This  workstation  provides  texture 
map  acceleration  in  hardware.  All  performance  statistics 
are  reported  for  unoptimized  code.  Experiments  were  con¬ 
ducted  using  both  minimization  methods. 

In  the  first  example.  Fig.  3,  an  active  blob  is  used  to 
track  a  region  on  a  deforming  green  ball.  This  tracking 
was  done  on-line.  In  other  words,  the  system  tracked  using 
the  current  frame  taken  from  live  video  input.  If  tracking 
computation  per  frame  could  not  keep  up  with  full  video 
frame-rate,  then  in-between  frames  were  skipped. 

Frames  from  the  original  video  sequence  are  shown  in 
the  top  row  of  Fig.  3.  The  region  contained  2394  pixels, 
and  deformation  was  modeled  using  the  ten  lowest-order 
modal  parameters.  The  recovered  active  blob  tracking  for 
Marquardt-Levenberg  is  shown  below  each  input  image. 
Using  the  Marquardt-Levenberg  minimization  technique, 
this  example  ran  at  an  average  of  two  frames  a  second,  with 
on  average  2  to  3  iterations  required  for  convergence.  The 
precomputation  time  was  0.13  secs. 

The  same  sequence  was  successfully  tracked  using  the 
difference  decomposition,  at  an  average  rate  of  1 2.4  frames 
per  second.  Precomputation  of  the  difference  basis  and  ma¬ 
trix  inverses  took  six  seconds. 

Figure  4  shows  another  on-line  tracking  example,  this 


time  tracking  a  bag  of  candy.  The  user  circled  the  region 
of  interest,  as  shown  in  the  upper  left  hand  comer  of  Fig.  4. 
The  resulting  active  blob  contained  2680  pixels,  and  defor¬ 
mation  was  modeled  using  the  twenty  lowest-order  modal 
parameters. 

The  figure  shows  every  twentieth  frame  in  the  tracking 
sequence.  The  resulting  difference  decomposition  track¬ 
ing  is  shown  below  each  input  image.  Tracking  was  ac¬ 
complished  at  8.4  frames  per  second  via  the  difference  de¬ 
composition.  The  required  precomputation  for  basis  and 
inverse  matrices  took  21  CPU  seconds.  As  can  be  seen, 
tracking  performs  well,  despite  very  large  deformations 
and  changes  in  orientation,  specular  highlights,  and  mov¬ 
ing  shadows. 

The  same  sequence  was  tracked  using  Marquardt- 
Levenberg,  at  an  average  rate  of  0.9  frames  per  second. 
Tracking  was  less  successful,  diverging  part  way  through 
the  sequence.  This  problem  can  be  solved  when  the 
Marquardt-Levenberg  procedure  is  mn  off-line,  allowing 
tracking  using  all  frames  from  the  video  sequence. 

7  Discussion 

The  active  blobs  formulation  allows  tracking  of  nonrigidly 
deforming  color  regions  at  over  twelve  frames  per  second 
when  the  difference  decomposition  is  employed.  This  is 
because  the  active  blobs  formulation  is  tailored  to  take  ad¬ 
vantage  of  texture  mapping  hardware  available  in  many 
workstations,  PCs,  and  game  consoles.  In  our  experience, 
higher  speed  tracking  is  possible  if  smaller  blobs  are  used. 

As  was  expected,  Marquardt-Levenberg  is  much  slower 
than  the  difference  decomposition.  This  is  due  mainly  to 
the  number  of  image  operations  calculated  per  iteration,  as 
explained  in  Sec.  4.2.  This  makes  Marquardt-Levenberg 
ill-suited  for  on-line  tracking,  unless  motions  are  slow. 

Tracking  speed  is  dependent  on  two  parameters:  num¬ 
ber  of  pixels  included  in  the  blob,  and  number  of  deforma¬ 
tion  parameters  employed  in  tracking.  Our  experience  has 
shown  that  for  either  minimization  method,  the  amount  of 
computation  needed  per  minimization  iteration  scales  ap¬ 
proximately  linearly  with  the  number  of  pixels.  Varying 
the  number  of  triangles  used  in  the  blob  does  not  impact 
tracking  speed  significantly. 

As  seen  in  the  examples,  the  robust  error  norm  enables 
reliable  tracking,  despite  nonrigid  deformations,  photomet¬ 
ric  changes,  and  small  occlusions. 
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Figure  3:  On-line  tracking  of  a  deformable  ball.  Marquardt-Levenberg  minimization  averaged  2  frames/sec,  while  the 
difference  decomposition  averaged  12.4  ffames/sec.  For  visualization  purposes,  an  outline  of  the  active  blob  is  shown 
overlaid  on  the  original  input  images.  The  recovered  active  blob  warps  for  Marquardt-Levenberg  are  shown  below  each 
input  image.  Difference  decomposition  tracked  with  comparable  accuracy. 


Figure  4:  On-line  tracking  a  deforming  plastic  bag  filled  with  candy  using  difference  decomposition.  This  figure  shows 
every  twentieth  frame  in  the  tracking  sequence.  For  visualization  purposes,  an  outline  of  the  active  blob  is  shown  overlaid  on 
the  original  input  images.  Marquardt-Levenberg  minimization  averaged  0.9  frames/sec,  while  the  difference  decomposition 
averaged  8.4  frames/sec.  Results  for  the  difference  decomposition  are  shown  below  each  input  image.  Marquardt-Levenberg 
was  not  able  to  track  the  object  completely  through  this  sequence. 
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